Abstract

Package ggmulti extends the ggplot2 package to add high dimensional visualization functionality such as serial axes coordinates (e.g., parallel, …) and multivariate scatter plot glyphs (e.g. encoding many variables in a radial axes or star glyph). In the end, we will render the static plots into a shiny app.

Keywords

ggplot2, high dimensional visualization, non-primitive point glyph, shiny

Serial Axes in ggplot2

Serial axes coordinate is a methodology for visualizing the \(p\)-dimensional geometry and multivariate data. As the name suggested, all axes are shown in serial. The axes can be a finite \(p\) space or transformed to an infinite space (e.g. Fourier transformation). In the finite \(p\) space, all axes can be displayed in parallel which is known as the parallel coordinate plot; or all axes are displayed under a polar coordinate that is often known as the radial coordinate plot. In the infinite space, a mathematical transformation is often applied (Inselberg and Dimsdale 1990).

The data we used is “iris” with four numerical variables and a categorical variable “Species.” Within each “Species” (3 in total), there are 50 observations. The following plot shows a parallel coordinate with axes “Sepal.Length,” “Sepal.Width,” “Petal.Length” and “Petal.Width” and coloured by “Species.”

library(ggmulti)
gp <- ggplot(iris,
             mapping = aes(
               Sepal.Length = Sepal.Length,
               Sepal.Width = Sepal.Width,
               Petal.Length = Petal.Length,
               Petal.Width = Petal.Width
             )) +
  geom_path(aes(colour = Species),
            alpha = 0.5) + 
  coord_serialaxes()  
gp

Also, one can create a radial coordinate plot by setting axes.layout = "radial" in coord_serialaxes().

More than lines, on each axis, some one dimensional layers (i.e. geom_histogram, geom_density) or the quantile layer (geom_quantiles) can be added to reveal the pattern of interest. For example, in the following figure, on each axis, a histogram is drawn, grouped by “Species.”

gp + 
  geom_histogram(aes(fill = Species),
                 alpha = 0.5)

Andrews (1972) plot is a way to project multi-response observations into a function \(f(t)\), by defining \(f(t)\) as an inner product of the observed values of responses and orthonormal functions in \(t\).

\[f_{y_i}(t) = <\textbf{y}_i, \textbf{a}_t>\]

where \(\textbf{y}_i\) is the \(i\)th responses and \(\textbf{a}_t\) is the orthonormal functions under certain interval. Andrew suggests to use the Fourier transformation

\[\textbf{a}_t = \{\frac{1}{\sqrt{2}}, \sin(t), \cos(t), \sin(2t), \cos(2t), ...\}\]

which is orthonormal on interval \((-\pi, \pi)\). To create an Andrews plot, one can only set the stat in geom_path() as "dotProduct".

ga <- ggplot(iris,
       mapping = aes(
         Sepal.Length = Sepal.Length,
         Sepal.Width = Sepal.Width,
         Petal.Length = Petal.Length,
         Petal.Width = Petal.Width
       )) +
  geom_path(aes(colour = Species),
            alpha = 0.5,
            stat = "dotProduct") + 
  coord_serialaxes()
ga

# Alternatively, one can modify the layer of `gp`
# gp$layers[[1]]$stat <- ggmulti::StatDotProduct
# gp

The default transformation function is Fourier transformation, users can customize their own. More details are shown here.

Non-primitve Glyphs

Glyphs can be used as point symbols in a scatterplot to convey more information on each point. This information could range from providing a more evocative picture for each point (e.g., an airplane for flight data or a team’s logo for sports data) to incorporating quantitative information (e.g., the values of other variables in a serial axes or star glyph or as a Chernoff face).

Serialaxes Glyph

Package ggmulti provides a variety of glyphs, e.g., serialaxes glyph, polygon glyph and image glyph. Continuing with the data set “iris,” a scatter plot can be constructed with the a parallel coordinate glyph. Note that, the shape of each glyph (the filled polygon) is constructed by the individual line shown in the first parallel plot as ymax and 0 as ymin.

gs <- ggplot(iris) +
  geom_serialaxes_glyph(
    mapping = aes(x = Sepal.Length, y = Sepal.Width, 
                  fill = Species),
    alpha = 0.4,
    serialaxes.data = iris[,1:4],
    axes.layout = "parallel"
  )
gs

Image glyph

library(png)
library(maps)
img_path <- list.files(file.path(find.package(package = 'ggmulti'),
                                 "images"),
                       full.names = TRUE)
Bucks <- png::readPNG("bucks.png")
Suns <- png::readPNG("suns.png")
# Golden State Coordinate
Milwaukee <- data.frame(
  lon = -87.9065,
  lat = 43.0389
)

Phoenix <- data.frame(
  lon = -112.0740,
  lat = 33.4484
)

map_data("state")  %>% 
  ggplot(aes(long, lat)) +
  # US map
  geom_polygon(mapping = aes(group = group), 
               color="black", fill="cornsilk") + 
  # Milwaukee Bucks Icon
  geom_image_glyph(data = Milwaukee,
                   mapping = aes(x = lon, y = lat), 
                   images = Bucks, 
                   imagewidth = 1.8,
                   colour = NA,
                   size = 3) + 
  # Phoenix Suns Icon
  geom_image_glyph(data = Phoenix,
                   mapping = aes(x = lon, y = lat), 
                   imagewidth = 1, 
                   imageheight = 1, 
                   colour = NA,
                   size = 3,
                   images = Suns) + 
  ggtitle("2021 NBA Finals")

Before It Ends

The gp and ga objects show the parallel coordinate plot and Andrews plot of data “iris” respectively. The gs object shows a scatter plot with a non-primitive “serialaxes glyph.” We can turn these objects interactive and rendered in a shiny web app.

# install.packages("loon.ggplot")
# or install the developed version 
# remotes::install_github("https://github.com/great-northern-diver/loon.ggplot")
library(loon.ggplot)
# install.packages("loon.shiny")
# or install the developed version 
# remotes::install_github("https://github.com/great-northern-diver/loon.shiny")
library(loon.shiny)
# turn `gp`, `ga` and `gs` to `loon` widgets
lp <- loon.ggplot::loon.ggplot(gp, linkingGroup = "iris")
la <- loon.ggplot::loon.ggplot(ga, linkingGroup = "iris")
ls <- loon.ggplot::loon.ggplot(gs, linkingGroup = "iris")
# render in `shiny`
loon.shiny::loon.shiny(
  list(ls, la, lp),
  layoutMatrix = matrix(c(rep(1, 12), rep(c(2, 2 ,3, 3), 2)), 
                        nrow = 5,
                        byrow = TRUE),
  # layout matrix is 
  # 1 1 1 1
  # 1 1 1 1
  # 1 1 1 1
  # 2 2 3 3
  # 2 2 3 3
  options = list(height = 500, width = 1000)
)

Shint App

This app provides direct and indirect manipulations :

Enjoy yourself!

Reference

Andrews, David F. 1972. “Plots of High-Dimensional Data.” Biometrics, 125–0136.
Inselberg, A., and B. Dimsdale. 1990. “Parallel Coordinates: A Tool for Visualizing Multi-Dimensional Geometry.” In Proceedings of the First IEEE Conference on Visualization: Visualization ‘90, 361–0378.
Waddell, Adrian, and R. Wayne Oldford. 2020. Loon: Interactive Statistical Data Visualization. http://great-northern-diver.github.io/loon/.